Graph Library #1521

Marsella8 · 2024-10-10T04:25:16Z

Description of changes:

Extending testing functionality for graph library
Bug fixing
General clean up

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

This change is

codecov · 2024-10-10T04:54:59Z

Codecov Report

Attention: Patch coverage is 89.47368% with 72 lines in your changes missing coverage. Please review.

Project coverage is 79.05%. Comparing base (1d5140d) to head (8564b8b).

Files with missing lines	Patch %	Lines
.../undirected/algorithms/get_connected_components.cc	0.00%	41 Missing ⚠️
...h/series_parallel/series_parallel_decomposition.cc	66.66%	12 Missing ⚠️
.../utils/graph/instances/hashmap_undirected_graph.cc	37.50%	10 Missing ⚠️
lib/utils/src/utils/graph/algorithms.cc	0.00%	3 Missing ⚠️
.../graph/instances/unordered_set_undirected_graph.cc	0.00%	2 Missing ⚠️
...rc/utils/graph/undirected/undirected_edge_query.cc	0.00%	2 Missing ⚠️
...aph/undirected/algorithms/get_neighboring_nodes.cc	0.00%	1 Missing ⚠️
...tils/src/utils/graph/undirected/undirected_edge.cc	0.00%	1 Missing ⚠️

Additional details and impacted files

@@                Coverage Diff                @@
##           repo-refactor    #1521      +/-   ##
=================================================
+ Coverage          78.16%   79.05%   +0.89%     
=================================================
  Files                860      873      +13     
  Lines              27994    28703     +709     
  Branches             770      772       +2     
=================================================
+ Hits               21881    22691     +810     
+ Misses              6113     6012     -101

Flag	Coverage Δ
unittests	`79.05% <89.47%> (+0.89%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
lib/utils/include/utils/containers/find.h	`100.00% <100.00%> (ø)`
...nclude/utils/graph/dataflow_graph/dataflow_graph.h	`44.44% <ø> (-5.56%)`	⬇️
lib/utils/include/utils/graph/digraph/digraph.h	`100.00% <ø> (ø)`
...b/utils/include/utils/graph/digraph/digraph_view.h	`100.00% <ø> (ø)`
...ls/include/utils/graph/multidigraph/multidigraph.h	`100.00% <ø> (ø)`
lib/utils/include/utils/graph/node/graph_view.h	`100.00% <ø> (ø)`
.../include/utils/graph/undirected/undirected_graph.h	`100.00% <ø> (ø)`
...ude/utils/graph/undirected/undirected_graph_view.h	`100.00% <ø> (+20.00%)`	⬆️
...ries_parallel/get_series_parallel_decomposition.cc	`98.14% <100.00%> (-1.86%)`	⬇️
.../utils/graph/series_parallel/parallel_reduction.cc	`100.00% <100.00%> (ø)`
... and 21 more

... and 123 files with indirect coverage changes

…y-testing

Marsella8

Reviewable status: 0 of 41 files reviewed, 1 unresolved discussion (waiting on @lockshaw)

lib/utils/src/utils/graph/series_parallel/series_parallel_decomposition.cc line 82 at r2 (raw file):

}

bool is_empty(Node const &node) {

borrowed from #1562, will fix them there if needed

…eedup

…ting

lockshaw

Reviewed 23 of 32 files at r1, 12 of 16 files at r2, 7 of 9 files at r3, all commit messages.
Reviewable status: 42 of 47 files reviewed, 35 unresolved discussions (waiting on @Marsella8)

lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc line 10 at r3 (raw file):

using namespace FlexFlow;

TEST_CASE("get_connected_components") {

Suggestion:

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("get_connected_components") {

lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc line 41 at r3 (raw file):

  }

  SUBCASE("3 components") {

Also test the case where everything is one component

Also test the case where the graph is empty

lib/utils/test/src/utils/graph/digraph/algorithms/get_dominators.cc line 34 at r3 (raw file):

      std::unordered_set<Node> correct = {n.at(0)};
      CHECK(correct == result);
    }

Suggestion:

    SUBCASE("small acyclic graph") {
      std::vector<Node> n = add_nodes(g, 4);
      std::vector<DirectedEdge> e = {
          DirectedEdge{n.at(0), n.at(3)},
          DirectedEdge{n.at(0), n.at(1)},
          DirectedEdge{n.at(0), n.at(2)},
          DirectedEdge{n.at(1), n.at(2)},
      };
      add_edges(g, e);

      SUBCASE("get_dominators(..., Node const &)") {
        Node node = n.at(2);
        std::unordered_set<Node> correct = {n.at(0), n.at(2)};
        std::unordered_set<Node> result = get_dominators(g, node);
        CHECK(correct == result);
      }
  
      SUBCASE("get_dominators(..., std::unordered_set<Node> const &)") {
        std::unordered_set<Node> nodes = {n.at(1), n.at(3)};
        std::unordered_set<Node> result = get_dominators(g, nodes);
        std::unordered_set<Node> correct = {n.at(0)};
        CHECK(correct == result);
      }
    }

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 20 at r3 (raw file):

std::unordered_multiset<Node> get_nodes(Node const &);

bool is_empty(Node const &node);

How would you end up with an empty SP decomposition? That doesn't seem like it should be allowed (i.e., might be a good idea to change the container in the splits to be some wrapper type of std::vector that requires itself to be non-empty)

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 28 at r3 (raw file):

SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp,
                                        Node const &node);

How is this a well-behaved operation? Like, what are the semantics of this? Are you sure we want this function as part of the API for SeriesParallelDecomposition?

Code quote:

SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp,
                                        Node const &node);

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 30 at r3 (raw file):

                                        Node const &node);

// duplicate nodes within `sp` are counted multiple times

Doxygen style please 🙂

Code quote:

// duplicate nodes within `sp` are counted multiple times

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 33 at r3 (raw file):

size_t num_nodes(SeriesParallelDecomposition const &sp);

SeriesParallelDecomposition serial_composition(

For consistency, never use "serial" in the FF codebase, always use "series"

Suggestion:

SeriesParallelDecomposition series_composition(

lib/utils/include/utils/graph/series_parallel/series_reduction.h line 19 at r3 (raw file):

std::unordered_set<std::vector<MultiDiEdge>>
    find_all_extended_series_reductions(MultiDiGraphView const &g);

What is an "extended series reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation

Also, it might be nice to create an actual datatype (e.g., ExtendedSeriesReduction) that is a wrapper type around std::vector<MultiDiEdge> rather than just using a raw vector.

Code quote:

    find_all_extended_series_reductions(MultiDiGraphView const &g);

lib/utils/test/src/utils/graph/undirected/undirected.cc line 15 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE_TEMPLATE(

Why are there two tests here?

lib/utils/test/src/utils/graph/undirected/undirected.cc line 18 at r3 (raw file):

      "UndirectedGraph implementations", T, HashmapUndirectedGraph) {

    RC_SUBCASE("Full", [&]() {

Why is this using rapidcheck?

lib/utils/src/utils/graph/views/views.cc line 2 at r3 (raw file):

#include "utils/graph/views/views.h"
#include "utils/bidict/algorithms/right_entries.h"

I'm not seeing `right_entries used anywhere here in new code. Was this already necessary and just missing, or can this be removed?

lib/utils/src/utils/graph/views/views.cc line 70 at r3 (raw file):

UndirectedEdge to_undirected_edge(DirectedEdge const &e) {
  return UndirectedEdge{{e.src, e.dst}};

Does commutative_pair not have an explicit constructor?

Code quote:

{e.src, e.dst}

lib/utils/src/utils/graph/views/views.cc line 122 at r3 (raw file):

  std::unordered_set<UndirectedEdge> undirected_edges =
      set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}),
                g.query_edges(UndirectedEdgeQuery{q.dsts}));

Can't you just union the query_sets (i.e., q.srcs and q.dsts) and then do the query, rather than unioning the query results?

Code quote:

  std::unordered_set<UndirectedEdge> undirected_edges =
      set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}),
                g.query_edges(UndirectedEdgeQuery{q.dsts}));

lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc line 12 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("get_unchecked_dfs_ordering") {

IIRC the semantics of unchecked_dfs_ordering are essentially that it is free to visit nodes multiple times (i.e., in a diamond graph of 4 nodes (a -> b, a -> c, b -> d, c -> d), it would either yield [a, b, d, c, d] or [a, c, d, b, d]--it would be good to add a testcase for this

lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc line 24 at r3 (raw file):

      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]});
      CHECK(correct == result);
    }

Suggestion:

    SUBCASE("simple path") {
      std::vector<Node> correct = {n[0], n[1], n[2], n[3]};
      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]});
      CHECK(correct == result);
    }
    
    SUBCASE("start from non-initial node") {
      std::vector<Node> correct = {n[1], n[2], n[3]};
      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[1]});
      CHECK(correct == result);
    }

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 11 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("DiGraph - algorithms.cc") {

Prefer one TEST_CASE per function, rather than one SUBCASE per function

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 16 at r3 (raw file):

    std::vector<Node> n = add_nodes(g, 4);
    std::vector<DirectedEdge> e = {
        DirectedEdge{n[0], n[1]},

Prefer .at for bounds checking

Suggestion:

        DirectedEdge{n.at(0), n.at(1)},

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 55 at r3 (raw file):

    }

    SUBCASE("get_sinks") {

Potentially rename to get_terminal_nodes/get_initial_nodes following https://reviewable.io/reviews/flexflow/flexflow-train/1565#-OGCWYwtE_rcz9e1WizH

lib/utils/test/src/utils/containers/contains.cc line 19 at r3 (raw file):

      std::unordered_set<int> s = {1, 2, 3, 4, 5};
      CHECK(contains(s, 3));
      CHECK(!contains(s, 6));

Doesn't doctest require this to use CHECK_FALSE? iirc doctest has an issue with CHECK(!...)

Code quote:

      CHECK(!contains(s, 6));

lib/utils/test/src/utils/graph/views/views.cc line 49 at r3 (raw file):

      CHECK(result == expected);
    }
  }

Probably the best way to test these (due to the complexity of the querying interface) would be to use rapidcheck to generate a bunch of operations and then ensure the behaviors are the same between the view and the non-view graph that would be generated. This is going to be annoying to write, but without restructuring things this is probably the correct approach. The good thing is that the equivalence testing code only needs to be written once per graphview type and then can be reused for each of the views.

Code quote:

  TEST_CASE("UndirectedSubgraphView") {
    UndirectedGraph g = UndirectedGraph::create<HashmapUndirectedGraph>();
    std::vector<Node> n = add_nodes(g, 5);
    add_edges(g,
              {UndirectedEdge{{n.at(0), n.at(3)}},
               UndirectedEdge{{n.at(1), n.at(1)}},
               UndirectedEdge{{n.at(1), n.at(2)}},
               UndirectedEdge{{n.at(1), n.at(3)}},
               UndirectedEdge{{n.at(2), n.at(3)}},
               UndirectedEdge{{n.at(2), n.at(4)}}});
    std::unordered_set<Node> sub_nodes = {n.at(0), n.at(1), n.at(3)};
    UndirectedGraphView view = view_subgraph(g, sub_nodes);

    SUBCASE("get_nodes") {
      std::unordered_set<Node> expected = {n.at(0), n.at(1), n.at(3)};

      std::unordered_set<Node> result = get_nodes(view);

      CHECK(result == expected);
    }

    SUBCASE("get_edges") {
      std::unordered_set<UndirectedEdge> expected = {
          UndirectedEdge{{n.at(0), n.at(3)}},
          UndirectedEdge{{n.at(1), n.at(1)}},
          UndirectedEdge{{n.at(1), n.at(3)}},
      };

      std::unordered_set<UndirectedEdge> result = get_edges(view);

      CHECK(result == expected);
    }
  }

lib/utils/include/utils/graph/series_parallel/parallel_reduction.h line 15 at r3 (raw file):

    find_parallel_reduction(MultiDiGraphView const &);

std::unordered_map<DirectedEdge, std::unordered_set<MultiDiEdge>>

What is an "extended parallel reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation

Also, it might be nice to create an actual datatype (e.g., ExtendedParallelReduction) that is a wrapper type around std::unordered_set<MultiDiEdge> rather than just using a raw unordered set

If this is just for accelerating the SP decomposition algorithm's rewriting step, it might be better to put these functions (and the corresponding ones for "extended series reductions" in that file rather than in the general "parallel_reduction.h" header)

lib/utils/test/src/utils/graph/multidigraph/algorithms/get_outgoing_edges.cc line 35 at r3 (raw file):

    SUBCASE("node has no outgoing edges") {
      std::unordered_set<MultiDiEdge> result = get_outgoing_edges(g, n.at(2));

Would be nice for this node to have some incoming edges to ensure they're not accidentally picked up

lib/utils/test/src/utils/graph/multidigraph/algorithms/get_incoming_edges.cc line 31 at r3 (raw file):

    SUBCASE("node has no incoming edges") {
      std::unordered_set<MultiDiEdge> result = get_incoming_edges(g, n.at(2));

Would be nice for this node to have some outgoing edges to ensure they're not accidentally picked up

lib/utils/include/utils/graph/node/node.struct.toml line 9 at r3 (raw file):

  "fmt",
  "json",
  "rapidcheck",

How are you using a default Arbitrary instance for Node? I have no issue adding generators for Node, but the default instance seems unlikely to be helpful as Node is only meaningful in the context of a graph?

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 263 at r3 (raw file):

          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {
          {e[0], e[1], e[2]}};

Prefer .at for bounds checking

Suggestion:

          {e.at(0), e.at(1), e.at(2)}};

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 267 at r3 (raw file):

    }

    SUBCASE("2 linear strands") {

Suggestion:

    SUBCASE("2 linear strands that share a common terminal node") {

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 277 at r3 (raw file):

      std::unordered_set<std::vector<MultiDiEdge>> result =
          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {{e[0], e[2]},

Prefer .at for bounds checking

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 300 at r3 (raw file):

          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {
          {e[0], e[2], e[7]}, {e[3], e[6]}, {e[5], e[9]}};

Prefer .at for bounds checking

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 372 at r3 (raw file):

        std::unordered_set<MultiDiEdge> result_edges = get_edges(g);
        std::unordered_set<MultiDiEdge> correct_edges = [&] {
          std::unordered_set<MultiDiEdge> new_edges = unordered_set_of(e);

Why not just set_minus here?

lib/utils/include/utils/graph/multidigraph/algorithms/get_outgoing_edges.h line 12 at r3 (raw file):

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_outgoing_edges(MultiDiGraphView const &g);

Suggestion:

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_outgoing_edges_map(MultiDiGraphView const &g);

lib/utils/include/utils/graph/multidigraph/algorithms/get_incoming_edges.h line 12 at r3 (raw file):

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_incoming_edges(MultiDiGraphView const &g);

Suggestion:

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_incoming_edges_map(MultiDiGraphView const &g);

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 27 at r3 (raw file):

      DirectedEdgeQuery result = directed_edge_query_all();

      CHECK(matches_edge(result, DirectedEdge{n.at(0), n.at(1)}));

Do you really need 5 checks here? It feels like the behavior of directed_edge_query_all could have been tested on a much smaller graph (or even without a graph at all, using rapidcheck). This feels like unnecessary reusing of g, which is why I recommend doing one TEST_CASE per function instead of one SUBCASE per function unless you have very good reason.

In fact, I don't see how g is necessary for any of the cases in this file?

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 38 at r3 (raw file):

          DirectedEdgeQuery{query_set{n.at(0)}, query_set{n.at(1)}};

      CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));

Check that it doesn't match reversed edges

Code quote:

      CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 57 at r3 (raw file):

        CHECK(result == correct);
      }
      SUBCASE("intersection with std::nullopt") {

Suggestion:

      SUBCASE("intersection with matchall") {

lib/utils/src/utils/graph/series_parallel/series_reduction.cc line 79 at r3 (raw file):

  }
  return unordered_set_of(values(strands));
}

Newlines please--this is really hard to read

Code quote:

std::unordered_set<std::vector<MultiDiEdge>>
    find_all_extended_series_reductions(MultiDiGraphView const &g) {
  std::unordered_map<Node, std::unordered_set<MultiDiEdge>> incoming_edges =
      get_incoming_edges(g);
  std::unordered_map<Node, std::unordered_set<MultiDiEdge>> outgoing_edges =
      get_outgoing_edges(g);
  std::unordered_map<Node, std::vector<MultiDiEdge>> strands;
  std::unordered_map<Node, Node> node_to_head_of_strand;
  for (Node const &n : get_topological_ordering(g)) {
    if ((incoming_edges.at(n).size() == 1) &&
        (outgoing_edges.at(n).size() == 1)) {
      MultiDiEdge incoming = get_only(incoming_edges.at(n));
      MultiDiEdge outgoing = get_only(outgoing_edges.at(n));
      Node pre = g.get_multidiedge_src(incoming);
      if (contains_key(node_to_head_of_strand, pre)) {
        Node head = node_to_head_of_strand.at(pre);
        node_to_head_of_strand.emplace(n, head);
        strands.at(head).push_back(outgoing);
      } else {
        node_to_head_of_strand.emplace(n, n);
        strands[n].push_back(incoming);
        strands[n].push_back(outgoing);
      }
    }
  }
  return unordered_set_of(values(strands));
}

Pietro Max Marsella added 2 commits October 9, 2024 21:18

Graph Testing initial cleanup

2be849a

fmt

0e8c962

Marsella8 requested a review from lockshaw October 10, 2024 04:25

Pietro Max Marsella added 5 commits October 14, 2024 17:09

removed unneccesary views, fixed views adjacent bugs

8e11b0b

minor optimization

87e767c

fmt

44b32f8

Merge remote-tracking branch 'origin/repo-refactor' into graph-librar…

a773b20

…y-testing

small fix

e7055ad

Marsella8 marked this pull request as ready for review October 18, 2024 22:02

get_series_parallel_decomposition fix

8564b8b

lockshaw changed the base branch from repo-refactor to master December 16, 2024 08:37

Marsella8 commented Dec 19, 2024

View reviewed changes

Pietro Max Marsella added 2 commits December 20, 2024 14:55

updates to graph documentation + get_series_parallel_decomposition sp…

8a04a2e

…eedup

Merge remote-tracking branch 'flexflow/master' into graph-library-tes…

7d262a8

…ting

lockshaw requested changes Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Library #1521

Graph Library #1521

Marsella8 commented Oct 10, 2024 •

edited

Loading

codecov bot commented Oct 10, 2024 •

edited

Loading

Marsella8 left a comment

lockshaw left a comment

Graph Library #1521

Are you sure you want to change the base?

Graph Library #1521

Conversation

Marsella8 commented Oct 10, 2024 • edited Loading

codecov bot commented Oct 10, 2024 • edited Loading

Codecov Report

Marsella8 left a comment

Choose a reason for hiding this comment

lockshaw left a comment

Choose a reason for hiding this comment

Marsella8 commented Oct 10, 2024 •

edited

Loading

codecov bot commented Oct 10, 2024 •

edited

Loading